
* This do-file creates a crosswalk between old and new industry codes (precision: 3-digit)
capture log close
log using tabdata5-1empcen.log, replace text

use "..\..\data\empcensus\source\CIC_ADJ-02-03.dta", clear
tostring cic02 cic03, gen(cs02 cs03)
gen cs02_3d=substr(cs02,1,3)
gen cs03_3d=substr(cs03,1,3)
sort cs02
duplicates tag cs02, gen(check)
order check cs02 cs03

* Fix one-to-many matches
sort cs02 cs03_3d
by cs02: gen first=cs03_3d if _n==1
by cs02: replace first=first[1] if first==""
by cs02: gen last=cs03_3d if _n==_N
by cs02: replace last=last[_N] if last==""
replace check=0 if check~=0 & first==last
preserve

import excel using "..\..\data\empcensus\source\industrycodecheck.xlsx", clear firstrow allstring
tempfile manu
save `manu'

restore
merge 1:1 cs02 cs03 cs02_3d cs03_3d using `manu', nogen
drop if drop=="1"

duplicates drop cs02 cs03_3d, force

keep cs02 cs03 cic03 cic02
save "..\..\data\empcensus\generated\cic_correspondence.dta", replace

log close
